{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Overview of GCG\n",
    "\n",
    "[GCG (Gradient-based Candidate Generation)](https://arxiv.org/abs/2307.15043) is a sophisticated jailbreaking attack method based on optimization algorithms. It aims to exploit the vulnerabilities of language models by intelligently manipulating the tokens in the `jailbreak_prompt`. By calculating gradients and iteratively replacing tokens, GCG minimizes the loss in the `reference_response` to craft prompts that can induce unexpected model behaviors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Detailed Explanation of the GCG Method\n",
    "\n",
    "The GCG method operates by focusing on the `jailbreak_prompt` tokens. It involves the following steps:\n",
    "\n",
    "- **Gradient Calculation:** Computes the gradient of the one-hot vectors of tokens in the `jailbreak_prompt`.\n",
    "- **Token Replacement:** Randomly selects a position in `jailbreak_prompt` and replaces the token at that position with one from the `top_k` tokens with the smallest gradient values.\n",
    "- **Sample Generation and Selection:** Generates `num_turb_sample` new samples and selects the one with the lowest loss in the `reference_response`.\n",
    "- **Iteration and Convergence:** Repeats the process for up to `max_num_iter` iterations or until the `target_response` prefix matches the `reference_response` exactly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to Use GCG?\n",
    "The following steps guide you through setting up your environment, preparing the dataset, initiating the GCG attack, and saving the results. Each step is associated with corresponding code snippets, which you should run in your Jupyter Notebook cells."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Setting up the Environment\n",
    "\n",
    "Before starting, ensure that you have installed the necessary packages, set up logging, and seeded the environment for reproducibility.\n",
    "\n",
    "The logging level is set up to be `INFO`. It can be changed to `DEBUG` for more details of the inner progress."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import easyjailbreak\n",
    "import torch\n",
    "import logging\n",
    "from transformers import set_seed\n",
    "\n",
    "# Set up logging format and seed\n",
    "logging.basicConfig(level=logging.INFO, format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')\n",
    "set_seed(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Preparing the Dataset and Model\n",
    "Load model, tokenizer, and dataset. The dataset is sliced to the first 10 entries for demonstration purposes. Adjust the slice as needed for your use case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `easyjailbreak.models.from_pretrained` method provides a convenient, one-click feature to load Hugging Face models. It returns an instance of the `easyjailbreak.models.HuggingfaceModel` class. This class packages the model and tokenizer together, ensuring unified, convenient management and use.\n",
    "\n",
    "The parameter `'vicuna_v1.1'` might catch your attention. This is used to indicate the specific version or type of the model. Based on this parameter, the method employs the `fastchat` library to obtain specific conversational formats tailored for the model.\n",
    "\n",
    "For more information about the possible model options and templates, and to understand how you can leverage them for your specific use-case, refer to the `FastChat` library's `conversation.py`. This resource is a treasure trove of information on different conversational formats and how they can be applied: [FastChat conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load model and tokenizer\n",
    "model = easyjailbreak.models.from_pretrained(\n",
    "    'lmsys/vicuna-7b-v1.5', \n",
    "    'vicuna_v1.1', \n",
    "    dtype=torch.bfloat16, \n",
    "    max_new_tokens=40)\n",
    "\n",
    "# Load dataset\n",
    "dataset_name = 'AdvBench'\n",
    "dataset = easyjailbreak.datasets.JailbreakDataset(dataset_name)[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Initiating and launching the Attack\n",
    "Initialize the API class with the required settings and start the attack process. Adjust the `batchsize` if you encounter Out of Memory (OOM) issues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# API class initialization and settings\n",
    "attacker = easyjailbreak.attacker.GCG(\n",
    "    attack_model=model,\n",
    "    target_model=model,\n",
    "    jailbreak_datasets=dataset,\n",
    "    jailbreak_prompt_length=20,\n",
    "    num_turb_sample=512,\n",
    "    batchsize=256,      # decrease if OOM\n",
    "    top_k=256,\n",
    "    max_num_iter=500\n",
    ")\n",
    "\n",
    "# Launch the attack\n",
    "attacker.attack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 Saving the Results\n",
    "After the attack, save the results for analysis and review."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_file = 'gcg_result.jsonl'\n",
    "attacker.jailbreak_datasets.save_to_jsonl(output_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Additional Resources\n",
    "The provided example/run_gcg.py script demonstrates a complete workflow, from setting up the environment to saving the attack results. It showcases how to use the GCG method effectively with the easyjailbreak package."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
